rsnc:ref,
version: 20240112(08.01)
authored: st. schwarz, FUB
#### content - EXMARALDA - audacity: how to anonymise interview audio -
explanations to the SES database - SketchEngine import corpus - ANNIS
framework - BERLANGDEV workflow - BERLANGDEV media status - class
findings
please find here
a .docx / pdf version of the pages.
mdbook instance: you can also use the <print>
button on the top right side to generate an uptodate printed
version.
to make sure you are viewing the most recent version of the wrap up,
reload the pages by visiting: https://pinghook.dh-index.org?page=pfaff-corpusclass-overview
for trouble with technical terms/expressions used please consult the glossary at the end.
note: this method is demanding fewer technical skills, but takes
longer to transcribe = type the text. if you decide to do use a more
technical demanding method, which allows for easy and fast
transkription, skip to section
2.2.
### preliminary - open the original .pdf you want to transcribe and the
partitur editor. best is to have a parallel view of the .pdf and the
partitur in horizontal split - open the template
<LLDM_exmaralda_basictemplate.exb> which you download from the HU
box or here. - in this template you
have already the necessary tiers for the transkription and annotation
created. (without content) ### speakertable - edit the speakertable to
relabel the tiers:
if you need to insert an empty segment in the middle of the transcript (because you forgot to transcribe a word e.g.) you can split an event which creates an empty segment.
you can also write the whole sentence into on segment an then split
like above the segments where you want by positioning your cursor at the
right position. the new segment will be created exactly where your
cursor is, that would be after the whitespace between 2 words if you
place it there. if there was a word after the whitespace, then that
would be the content of the next segment (including every word which
followed, you have to repeat the step for each word in the
sentence.)
the reverse operation (combining segments) is also possible; mark the
segments you want to combine (like cells in an excel table, not with
SHIFT-hold, but by moving over them mouse-clicked) and choose
<event:merge>.
note: for this method some technical skills are needed, but you
definitely save effort transcribing.
### preliminary - open the original .pdf you want to transcribe and a
simple text editor, rather not word, use this one
(VS Code) for example. - best is to have a parallel view of the .pdf
and the editor in horizontal split - download the template
<LLDM_exmaralda_basictemplate.exb> from the HU box or here
### transcription - transcribe (type) the text as is is written in the
.pdf into a plain text file in the editor - transcribe every
written form, including stroke through words or phrases, i.e. every
information that could be later on analysed - you can mark up
information like this or use your own (consistent!) system, important
is, that you later (see sec. 2.4) be
able to transfer your marked up information into an annotation in the
transcript, e.g. like here where “an” was stroke through in the original
text:
gut erinnern an die Zeit in der Grundschule und _an-strike_ die ersten Jahre auf dem Gymnasium, in der immer viel abgeschrieben wurde
( |([,.;:!?()] )|[a-zäöüß](?=([,.;:?!()] )))
print("voila:")
print("dont forget to save your work!")
excerpt:
aber man sollte mit#nonstandard# Personen nach dem Charakter beurteilen und wenn man das nicht macht, dann bist du auf den#nonstandard case# falschen Weg und dann bekommst du auch die (...) nicht guten#style# Freunde
if everything works well the transcription should now include a lemma and pos-tag tier.
follow the steps in section 2.2 to get the annotation tiers and metadata scheme into your transcription. to edit the metadata and speakertable and assign the speakers, follow that guide.
{filename(childcode, e.g. MIM)}_{language(DE/EN)}_{textversion(WN/WE)}_{venue(SESB/RKO)}_{10/12}.txtMIM_DE_WN_SESB_12.txt
if you have finished transcribing the primary text, it is necessary
to add further tiers (rows) for annotation.
you will first add a tier, with a normalised version of the text as
content. for that: - add a new tier - choose to <copy events from>
the speaker tier and label it [norm]
follows
0:090,211:53open the database SES_database_by_tokens.xlsx in excel or
numbers (the database is about 26MByte, on a Mac choose rather Excel,
the processing will be faster than in Numbers).
in general you would prefer e.g. OpenRefine
instead of excel or numbers for best working with the table.
to filter the table rows for specific tokens, speakers etc: - open Daten > Filter - click the dropdown arrow in the column you want to filter in, e.g. p_speaker - deselect the „select all“ button (click on it; by default its selected with a häkchen) - now there should be no häkchen in any square/button - select e.g. the speaker you want to filter for / häkchen setzen - apply filter - you can apply several filters at once to limit concordance to language of interview or age or whatever limitation you want - if you want to filter in the token column, you can put in/search for a free text token and then select what matches your search - if you want to turn filters off you have to be again in the dropdown filter option of the column and remove the filter there, > filter entfernen
| column | explanation | example |
|---|---|---|
| p_interview | transcript | GCA |
| p_speaker | speaker | #GCA |
| p_token | token | Mach |
| p_lemma_SkE | sketch engine lemma | machen-v |
| p_lemma | only the lemma | machen |
| p_turn | turn, sentence | #GCA : 43 Mach ich die Arbeit die Schule c_NPV . |
| p_turn_preceding | the preceding turn | #INT : 42 ( activities_after_school ) was machst du nach der Schule , wenn du nicht hier bist ? |
| p_transcriptLine | transcript line of the token | 43 |
| m_feature_eval | empty evaluation column for your researches. you can use this as a selector for finding by turning it TRUE or FALSE | FALSCH |
| m_free_col | empty evaluation column for your researches. you can use this as a selector for finding by turning it TRUE or FALSE | 0 |
| t_tag_SkE | full german RFTag. the following columns seperate this tag into the single items | VIMP.Full.2.Sg |
| t_PoS_ok | selector to switch if the tag is correct | 1 |
| t_PoS | PartOfSpeech | VIMP |
| t_category | NA | Full |
| t_funct | NA | - |
| t_case | NA | - |
| t_pers | NA | 2 |
| t_num | NA | Sg |
| t_gender | NA | - |
| t_tense | NA | - |
| t_mode | NA | - |
| part_L1 | participant L1 | G |
| part_sex | participant sex | f |
| part_age | participant age | 8 |
| part_CoB | participant contry of birth | Greece |
| part_YiG | participant years in germany | 0.5 |
| part_YoSH | particiant years of school in heritage country | 0 |
| part_LPM | participant language proficiency mother | kann deutsch |
| part_LPF | participant language proficiency father | kann deutsch |
| part_LUM | participant language use mother | greek |
| part_LUF | participant language use father | greek |
| part_LUS | participant language use siblings | greek |
| part_LUFR | participant language use friends | N.A. |
| c_NSM | nonstandard semantics | 0 |
| c_PAU | pause | 0 |
| c_NPV | nonstandard possessive | 1 |
| c_NNS | nonstandard not specified | 0 |
| c_NPR | nonstandard preposition | 0 |
| c_NAG | nonstandard agreement | 0 |
| c_0MD | zero modal | 0 |
| c_0SU | zero subject | 0 |
| c_NWO | nonstandard word order | 0 |
| c_0OB | zero object | 0 |
| c_0PR | zero preposition | 0 |
| c_COM | comment | 0 |
| c_NCM | nonstandard comparison | 0 |
| c_0AR | zero article | 0 |
| c_NVP | nonstandard VP | 0 |
| c_0VP | zero VP | 0 |
| c_NGN | nonstandard gender | 0 |
| c_0AU | zero auxiliary | 0 |
| c_0CP | zero copula | 0 |
| c_NEX | nonstandard existential | 0 |
| c_NRL | nonstandard relative | 0 |
| c_NAR | nonstandard article | 0 |
| c_NMD | nonstandard modal | 0 |
| c_0PT | zero predicate | 0 |
| c_NPE | nonstandard person | 0 |
| c_0RF | zero reflexive | 0 |
| c_NIO | nonstandard i.o. | 0 |
| c_NPS | nonstandard person | 0 |
| c_0PN | zero plural/numeral | 0 |
| c_NPO | nonstandard pronoun | 0 |
| c_0RL | zero relative | 0 |
| c_0EX | zero existential | 0 |
| c_NNN | nonstandard not specified | 0 |
| c_NCP | nonstandard copula | 0 |
| c_0RP | zero reflexive pronoun | 0 |
| c_0PD | zero predicate | 0 |
| c_NVC | nonstandard vocab | 0 |
| c_NEA | nonstandard extra article | 0 |
| c_NCN | nonstandard conditional | 0 |
as you see in above table, theres a lot of possible filtering options
working with the SES database.
you can do simple queries for token, lemma or PoS tag or refine your
query applying filters to metadata or coded features as well.
this is a short tutorial of how to import texts to Sketch Engine to create a corpus of your own. you can then do researches in this corpus via the SketchEngine exploration tools.
open the Sketch Engine login page via: https://auth.sketchengine.eu/#login and choose your affiliated institution. you can also create your own account or log in via google.
version without header for sketchengine that you will find
in the HU-BOX.start the compilation of your corpus. this will tag the texts with PartOfSpeech-tags and lemmatize the words. more information on the used (for german) tagset you find here.
your corpus is now ready to be explored. find all information to the query language and further guides in the Sketch Engine Help
find your way through: https://corpus-tools.org/annis/,
install ANNIS on your system and try to import the zipped ANNIS SES
corpus you find in the HU-box. > folder:
sketch engine Work, namescheme of latest zip:
[datestamp]_SES_annis_tagged_corpus.zip
the following is just for documentation of the process; you
wont have to follow these steps, just follow above instructions to
install ANNIS on your system and import the zipped corpus. -
upload files in HU box folder
version without header for SketchEngine upload to
SketchEngine > create new corpus - expert compiler
settings > adapt docscheme to >
sesCPT - with that done you can already explore the SES
corpus in the SketchEngine GUI using the built in CQL (corpus query
language) commands. - download corpus (vertical) - corpus is now a
database of token, PoS, lemma; tagged according to the GermanRF
tagset1 used by SketchEngine - process database in:
conc-essai.R - splits PoS tag (scheme:
x.x.x.x.x) into seperate columns defining classes of PoS
tags - writes single .xlsx files for each kid into folder - ANNIS
preprocessing: - pepper:
xls > treetagger format from .xlsx files folder. parameter file - pepper:
treetagger > annis graph format from treetagger files
folder. parameter file - zip annis
graph files - upload annis.zip to ANNIS localhost server
please find here: link follows an ANNIS server installation with the SES corpus ready to use. (! 20230904: the link is not yet freely available, use the link shared in moodle if you dont want to use your own local installation !)
this part will include workflow description of how to upload content to BERLANGDEV and edit metadata.
| id | child | CHAT | sanscodes | audio | |
|---|---|---|---|---|---|
| 1 | GCA | 1 | 1 | 1 | 1 |
| 2 | GCB | 1 | 1 | 1 | 1 |
| 3 | GCC | 1 | 1 | 1 | 1 |
| 4 | GCD | 1 | 1 | 1 | 1 |
| 5 | GCE | 1 | 1 | 1 | 1 |
| 6 | GCF | 1 | 1 | 1 | 1 |
| 7 | GCG | 1 | 1 | 1 | 1 |
| 8 | GDA | 1 | 1 | 1 | 1 |
| 9 | GDB | 1 | 1 | 1 | 0 |
| 10 | GDC | 1 | 1 | 1 | 1 |
| 11 | GDD | 1 | 1 | 1 | 1 |
| 12 | GDE | 1 | 1 | 1 | 0 |
| 13 | GDF | 1 | 1 | 1 | 1 |
| 14 | TAA | 1 | 1 | 1 | 1 |
| 15 | TAB | 1 | 1 | 1 | 0 |
| 16 | TAC | 1 | 1 | 1 | 0 |
| 17 | TAD | 1 | 1 | 1 | 1 |
| 18 | TAE | 1 | 1 | 1 | 1 |
| 19 | TAF | 1 | 1 | 1 | 1 |
| 20 | TAG | 1 | 1 | 1 | 1 |
| 21 | TAH | 1 | 1 | 1 | 1 |
| 22 | TAI | 1 | 1 | 1 | 1 |
| 23 | TBB | 1 | 1 | 1 | 1 |
| 24 | TBC | 1 | 1 | 1 | 1 |
| 25 | TBD | 1 | 1 | 1 | 1 |
| 26 | TBE | 1 | 1 | 1 | 1 |
| 27 | TBF | 1 | 1 | 1 | 1 |
| 28 | TBG | 1 | 1 | 1 | 1 |
| 29 | TBH | 1 | 1 | 1 | 1 |
| 30 | TBI | 1 | 1 | 1 | 1 |
| 31 | TBK | 1 | 1 | 1 | 1 |
| 32 | TBL | 1 | 1 | 1 | 1 |
| 33 | TBM | 1 | 1 | 1 | 1 |
| 34 | TBN | 1 | 1 | 1 | 1 |
| 35 | TBO | 1 | 1 | 1 | 1 |
| 36 | TBP | 1 | 1 | 1 | 1 |
| 37 | TBQ | 1 | 1 | 1 | 1 |
| 38 | TBR | 1 | 1 | 1 | 1 |
| 39 | TBS | 1 | 1 | 1 | 1 |
| 40 | TBT | 1 | 1 | 1 | 1 |
| 41 | TBU | 1 | 1 | 1 | 1 |
| 42 | TBV | 1 | 1 | 1 | 1 |
brief overview of student findings exploring the corpus
| Student | Child.Code | Age | Prepositions | Articles | Conjunctions | Paraphrase.with.verb | Hesitation.phenomena..Pauses..repeated.articles |
|---|---|---|---|---|---|---|---|
| Griechische Kinder | NA | NA | NA | NA | NA | NA | NA |
| Laura | GCC | 9 | auf: 3, an: 1, in: 7, nach: 1, zu: 7 - zun: 1, hinter:0, neben:0, vor:1 | NA | NA | NA | Viele Pausen, häufiges Zögern |
| NA | GDC | 8 | auf: 12, an:1, in 19 anstatt im:1, nach:5, zu:1, hinter:0, neben:0, vor:1 | NA | NA | NA | viele Pausen |
| NA | GCG | 9 | NA | NA | NA | NA | überlegt oft kurz, wenn sie nicht genau weiß, was sie zunächst sagen wird |
| NA | GDD | 9 | NA | NA | NA | NA | NA |
| Türkische Kinder | NA | NA | NA | NA | NA | NA | NA |
| Laura | TAC | 12 | auf: 16, an: 2, in: 6, nach: 4, zu: 7, hinter: 0, neben: 2, vor: 1 | NA | NA | NA | wenig Pausen oder Zögern |
| NA | TBF | 12 | NA | NA | NA | NA | NA |
| NA | TAI | 13 | auf: 8, an: 1, in: 12, nach: 5, zu: 3, hinter: 1, neben: 3, vor: 1 | NA | NA | NA | Viele Pausen, häufiges Zögern |
| NA | TBB | 14 | NA | NA | NA | NA | NA |
| Student | Child.Code | Age | Prepositions | Noticing | Self.correction..content.or.form. | Interviewer | More.information |
|---|---|---|---|---|---|---|---|
| Griechische Kinder | NA | NA | NA | NA | NA | NA | NA |
| Katharina | GDA | 11 | NA | 0016 bis 0022 (answers INT not transcripted)//GDA: Hier, das Maedchen und ein Jung basteln einen Schneemann. Das Maedchen macht ein Mohrruebe fuer Nase. Und hier gibt sie…@ ’ne…@ ’ne Stock 90 obj i.o.. //INT: was ist das? //GDA: Das hier? hm //INT: was macht man damit? //GDA: Machen sauber. //INT: Genau, n Besen./[später:] und nicht den Besen/–>Bedeutungsverhandlung: gemeinsam Semantik umschreiben und erfassen; dann Vorschlag, der aufgegriffen wird/ | 0194 [Drachen] wenn es ganz gut Luft ist, Luft gib’s, dann… /0237 Hier fragt ein Frau…@ den Schna- Schaffner wo geht da, der Zug | lacht viel, macht Späße, fragt freundlich nach, stimmt, genau/Recasts/0096 (answer not transcripted) Er faellt sich um. Hmh, fällt runter. (Übergeneralisierung reflexive Verben)/0104 (answer not transcripted) [der Junge] fangt. Ja, fängt. /0181 (answer not transcripted) Soll ich sagen, wohin sie gehört? Ja, dann ist da noch irgendwas, was dazu gehört./0182 (answer not transcripted) Hier sind so Kristall. Eis. Ja genau, Eis. Eiskristalle | NA |
| Türkische Kinder | NA | NA | NA | NA | NA | NA | NA |
| Katharina | TAF | 13 | NA | Zeile 0059 (Chat file) Kopf/–> Noticing, Pause + direkter Vorschlag – lernt direkt etwas neues/0091 (Chat) Besen/–> Vorschlag, der direkt übernommen wird./0103 0104 (Chat) Mohrrübe/ | Sie setzt sich im einen Wagen und der Maedchen zieht sie- zieht er. (0068)/Ja, wenn du kein Fahrer- Fahrkarte hast, dann musst du wieder aussteigen. (0163)/weil, er traegt viel schwerer und er traegt (…) langs- ne bisschen leichter (0131) /da finden die- da find der Junge sein Vater und Mutter (0179)/–> Umstrukturierung Syntax auch für L1 typisch | Interviewer sagt v.a. ok, gut, hmh, prima, alles ist erlaubt, lacht, ermutigt bei langen Pausen, weiterzumachen//Nachfrage Zeile 0038 (Chat) Irrenhaus//Recast 0146 (answer from INT not transcipted)//der Apfel gehoert die Aepfeln. Und warum gehören die beiden zusammen? | Viele und lange, ungefüllte Pausen |
| NA | TBV | 14 | NA | NA | NA | NA | NA |
| NA | TBE | 13 | NA | NA | NA | NA | NA |
| NA | TBF | 12 | NA | NA | NA | NA | NA |
| NA | TBM | 13 | NA | NA | NA | NA | NA |
| NA | TBN | 14 | NA | NA | NA | NA | NA |
| Student | Child.Code | Age | Prepositions | Articles | Conjunctions | Hesitation.phenomena..Pauses..repeated.articles | Self.correction..content.or.form. |
|---|---|---|---|---|---|---|---|
| Griechische Kinder | NA | NA | NA | NA | NA | NA | NA |
| Miriam | GCA | 8 | auf: 3/an: 1 -> am: 1/in: 31/nach: 4/zu: 3/hinter: 1/neben:/vor: 0 | Reduced forms ///n: 1 -> ein/einen | und: 95/dann: 11/danach: 0/weil: 1 | Viele Pausen, häufiges Zögern | Wenig Selbstkorrekturen |
| NA | GCE | 11 | auf: 0/an: 0/in: 0/nach: 0/zu: 0/hinter: 0/neben: 0/vor: 0 | Reduced forms /// | und: 180/dann: 25/danach: 2/weil: 3 | Wenig Pausen oder Zögern | Mehr Selbstkorrekturen |
| NA | GDE | 10 | auf: 31/an: 0/in: 49 -> inzu: 1/nach: 5/zu: 26 -> inzu: 1/hinter: 3/neben: 2/vor: 3 | Reduced forms ///n: 4 -> ein/einen//ne: 5 -> eine//s | und: 57/dann: 13/danach: 1/weil: 9 | Seit 10 Jahren in Deutschland/Keine Audio-Datei | NA |
| NA | GDF | 11 | auf: 10/an: 0/in: 25/nach: 6/zu: 15/hinter: 2/neben: 2/vor: 2 | Reduced forms / | Und/Und dann/danach/weil | Wenig Pausen oder Zögern | Bewusste Selbstkorrekturen |
| Türkische Kinder | NA | NA | NA | NA | NA | NA | NA |
| Miriam | TAA | 13 | auf: 16/an: 2/in: 6/nach: 4/zu: 7/hinter: 0/neben: 2/vor: 1 | Reduced forms / | Und/Und dann/danach/weil | Wenig Pausen oder Zögern | Inhaltliche Selbstkorrekturen, weniger grammatikalisch |
| NA | TAD | 14 | auf: 2 -> aufm: 1/an: 0/in: 5 -> inne: 1/nach: 2/zu: 6/hinter: 1/neben: 2/vor: 0 | Reduced forms / | Und/Und dann/danach/weil | Viele Pausen, häufiges Zögern | Wenig Selbstkorrekturen |
| NA | TBC | 14 | auf: 14/an: 6/in: 9/nach: 2/zu: 16/hinter: 3 -> hinterher: 1/neben: 4/vor: 0 | Reduced forms / | Und/Und dann/danach/weil | Viele Pausen, häufiges Zögern | Wenig Selbstkorrekturen |
| NA | TBD | 13 | auf: 14/an: 0/in: 18/nach: 6/zu: 7/hinter: 0/neben: 1 -> daneben: 1/vor: 2 | Reduced forms / | Und/Und dann/danach/weil | Wenig Pausen oder Zögern | Wenig Selbstkorrekturen |
| Carol | test-ok | NA | NA | NA | NA | NA | NA |
roughly: the number of occurences of one (coded nonstandard) feature over the number of total instances of the feature (including standard + nonstandard realisations). e.g.:
| token | instances | standard | nonstandard | normalised |
|---|---|---|---|---|
| schnee | 54 | 33 | 21 | 38.8888889 |
| all | coded “FALSCH” | coded “1” (feature = TRUE) | percent (D2/B2\\\*100) |
workflow:
the following is the output table of the multivariate analysis of a
frequency table of all feature codes over all target childs.
the frequency table was exported from an ANNIS installation of the SES
corpus. the query for getting the proper results is:
codetag = /c_.*/ & int = /T.*|G.*/ & #1 . #2 this
outputs all occuring codes over the transcripts and associates them to
the speaker, either T=any turkish or G=any greek. with that you get a
frequency table looking like this (exerpt):
| featurecode | child | count |
|---|---|---|
| NPR | GCA | 16 |
| COM | TBR | 16 |
| COM | TAA | 15 |
| 0AR | TBL | 14 |
| NPR | TBT | 14 |
| 0AR | GCB | 14 |
| COM | TBL | 14 |
| 0AR | GDA | 13 |
| COM | TBM | 13 |
| NNS | TBQ | 13 |
| 0AR | TBS | 13 |
| COM | GCA | 13 |
| NPR | TBU | 12 |
| NPR | GCC | 12 |
| 0AR | TBT | 12 |
| COM | TBU | 12 |
| 0AR | TBU | 11 |
| NNS | GCB | 11 |
visualised:
script source: https://github.com/esteeschwarz/HU-LX/blob/main/scripts/distribution-analysis.R
the applied lmer (linear mixed effects regression model) 3 formula
is:
count ~ feature + (1 | L1)
in words: we posited a main effect of feature and random effects of
L1. with this assumption a significance of p < 0.05
was tested at 0AR (zero article), i.e. the coding of this feature (here)
depends significant on the L1 use of the target child.
IMPORTANT: this does not allow general statements about
the relation of 0AR feature and L1 since the transcript corpus was not
coded/annotated until every instance of each feature.
| Estimate | Std. Error | df | t value | Pr ( \\\\\\\\\\\\\\\> t) | |||
| feature0AR | 5,324 | 2,778 | 333 | 1,916 | 0,056 | ||
| featureNPR | 4,286 | 2,786 | 333 | 1,538 | 0,125 | ||
| featureNAG | 3,353 | 2,817 | 333 | 1,19 | 0,235 | ||
| featureNNS | 2,8 | 2,792 | 333 | 1,003 | 0,317 | ||
| feature0SU | 2,571 | 2,777 | 333 | 0,926 | 0,355 | ||
| featureNGN | 1,893 | 2,786 | 333 | 0,679 | 0,497 | ||
| featureNPE | 2 | 2,957 | 333 | 0,676 | 0,499 | ||
| feature0PR | 1,682 | 2,799 | 333 | 0,601 | 0,548 | ||
| feature0OB | 1,261 | 2,797 | 333 | 0,451 | 0,652 | ||
| featureNSM | 1,167 | 2,794 | 333 | 0,418 | 0,677 | ||
| featureNWO | 1,067 | 2,828 | 333 | 0,377 | 0,706 | ||
| (Intercept) | 1 | 2,738 | 333 | 0,365 | 0,715 | ||
| feature0EX | 1 | 3,353 | 333 | 0,298 | 0,766 | ||
| feature0MD | 1 | 3,872 | 333 | 0,258 | 0,796 | ||
| featureNPO | 1 | 3,872 | 333 | 0,258 | 0,796 | ||
| featureNVP | 0,667 | 2,957 | 333 | 0,225 | 0,822 | ||
| featureNEX | 0,6 | 2,999 | 333 | 0,2 | 0,842 | ||
| feature0RF | 0,5 | 3,353 | 333 | 0,149 | 0,882 | ||
| featureNCM | 0,429 | 2,927 | 333 | 0,146 | 0,884 | ||
| feature0CP | 0,375 | 2,822 | 333 | 0,133 | 0,894 | ||
| feature0AU | 0,333 | 2,957 | 333 | 0,113 | 0,91 | ||
| feature0VP | 0,308 | 2,841 | 333 | 0,108 | 0,914 | ||
| featureNRL | 0,267 | 2,828 | 333 | 0,094 | 0,925 | ||
| featureNPV | 0,091 | 2,86 | 333 | 0,032 | 0,975 | ||
| feature0AP | 0 | 3,872 | 333 | 0 | 1 | ||
| feature0EL | 0 | 3,872 | 333 | 0 | 1 | ||
| feature0PD | 0 | 3,872 | 333 | 0 | 1 | ||
| feature0PN | 0 | 3,061 | 333 | 0 | 1 | ||
| feature0PT | 0 | 3,872 | 333 | 0 | 1 | ||
| feature0RL | 0 | 3,872 | 333 | 0 | 1 | ||
| feature0RP | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNAR | 0 | 2,927 | 333 | 0 | 1 | ||
| featureNAU | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNCA | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNCJ | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNCN | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNCP | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNEA | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNIO | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNMD | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNNN | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNPN | 0 | 3,872 | 333 | 0 | 1 | ||
| featureNPS | 0 | 2,957 | 333 | 0 | 1 | ||
| featureNVC | 0 | 3,872 | 333 | 0 | 1 |
some explanation to terms used maybe outdated and other stuff useful to know:
| term used | german | english |
|---|---|---|
| leerzeichen, space, spacebar | whitespace | |
| return, enter, CR | senden, bestätigen, ausführen, zurück | |
| CMD, command | apple: befehlstaste | |
| CTRL, control, steuerung, STRG | apple: steuerungstaste | |
| ALT | apple: wahltaste | apple: option |
| SHIFT | apple: umschalten wtf. | |
| backspace | ? | |
| delete | ? | |
| extension | the part of the filename after the dot (.) which enables your operating system to recognize the fileformat and choose the appropriate application to open the file with. e.g.: .mp3 for music, .jpg for pictures, .txt for plaint text files, .docx for word documents. (the extension doesnt really matter, you can open any file from within any application as well with the \<open\> dialogue of the application, but to enable opening a file by just clicking at it, it has to have an extension to signal the system which app to choose. | |
| segmentation | in the partitur editor you can segment a transcript/text into several parts. the decision of what is part of a segment depends on the questions you have to the text or how you want to annotate it. segmentation per token allows for token limited annotation, segmentation per clause or sentence for wider range annotation. a segment has to be finished with a whitespace, i.e. the last character of a segment has to be a leerzeichen/whitespace/empty character. | |
| regex, regular expressions | reguläre ausdrücke | info / learn/try out regular expressions |
to dive deeper into the sujet: https://de.wikipedia.org/wiki/Tastenkombination
Bates u. a., „Fitting Linear Mixed-Effects Models Using lme4“. 2015. doi: 10.18637/jss.v067.i01↩︎
Bates u. a., „Fitting Linear Mixed-Effects Models Using lme4“. 2015. doi: 10.18637/jss.v067.i01↩︎
Bates u. a., „Fitting Linear Mixed-Effects Models Using lme4“. 2015. doi: 10.18637/jss.v067.i01↩︎